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Abstract 

We study the large deviations performance of consensus+innovations distributed detection over noisy 
networks, where sensors at a time step k cooperate with immediate neighbors (consensus) and assimilate 
their new observations (innovation.) We show that, even under noisy communication, all sensors can 
achieve exponential decay 6^'''^'"= of the detection error probability, even when certain (or most) sensors 
cannot detect the event of interest in isolation. We achieve this by designing a single time scale stochastic 
approximation type distributed detector with the optimal weight sequence {ak}, by which sensors weigh 
their neighbors' messages. The optimal design of {ak} balances the opposing effects of communication 
noise and information flow from neighbors: larger, slowly decaying ak improves information flow but 
injects more communication noise. Further, we quantify the best achievable Cdis as a function of the 
sensing signal and noise, communication noise, and network connectivity. Finally, we find a threshold on 
the communication noise power below which a sensor that can detect the event in isolation stiU improves 
its detection by cooperation through noisy links. 
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I. Introduction 

Consider a generic network of sensors that sense the environment to detect an event of interest. 
In centralized detection, the measurements of all sensors at all times k are available at the detector 
(fusion center.) Under appropriate conditions, the probability of error P'^{k) of the centralized minimum 
probability of error detector decays in A; at an exponential rate, P'^ik) ^ e"*^*^, where C is the (centralized) 
Chemoff information. We research in this paper the equivalent question of exponential rate of decay of 
the probabiUty of error for distributed detection at each local sensor. We consider this question when 
the (local) communications among sensors is through noisy links. 

To be specific, we study consensus+innovations distributed algorithms Uke for example the LMS and 
RLS adaptive algorithms in [1], [2], [3], the detectors in [4], [5], or the estimators in [6]. In consen- 
sus+innovations distributed algorithms, at time k, each sensor updates its state 1) by a weighted average 
of the states of its neighbors (consensus); and 2) by incorporating its local measurement (innovations): 

state fe+i = state fe + 7^ consensus ^ + 7^ innovations fe. (1) 

Consensus+iimovations detectors like in (1) are distributed, stochastic approximation type algorithms, 
but particular algorithms make different choices of the time-decaying weight sequences 7^., i = 1,2, 
in (1) by which sensors weigh the consensus (their neighbors' messages) and the innovations (their own 
measurements) terms at each time k: [1], [2], [3] set = jj, i = 1,2; in [4] they vanish at the same 
rate; while [5], [6] consider single but also mixed scale algorithms where these weight sequences vanish 
at different rates. We will show that key to achieving exponential decay of the distributed detector P^ at 
all sensors is the suitable design of the weights 7^., i = 1, 2, in (1). 
This paper addresses three natural questions: 

1) Centralized versus distributed (through noisy links) detection: under which, if any, conditions can 
consensus+iimovations distributed detection achieve exponentially fast decay of the (detection) error 
probability at each sensor, P^ ~ e~'^'^'='^-the best possible decay rate Cdis, not necessarily equal to 
the centralized Chemoff information C; in other words, when the sensors cooperate through noisy 
links, can distributed detection achieve at every sensor an exponential rate of decay like centralized 
detection. We answer affirmatively this question, under mild structural conditions, by careful design 
of the weight sequences in (1). 

2) Can cooperation through noisy links help: How close can Cdis approach the corresponding (central- 
ized) Chemoff information C? We solve this by designing the best weight sequences 7^., i = 1,2, 
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that maximize Cdis- We explicitly quantify the optimal C^j^ as a function of a sensing signal-to-noise 
ratio SSNR and a communication signal-to-noise ratio CSNR that we will define, and the network 
algebraic connectivity^ Our analysis reveals opposing effects: small and fast decaying weight 7^ 
injects less communication noise, but also reduces the information flow from neighbors; the optimal 
weights strike the best balance between these two effects. 
3) SSNR versus CSNR— how much communications noise can distributed detection sustain: What is the 
highest communications noise level, i.e., lowest CSNR, for which cooperation helps? Let sensor i 
be the best sensor among all locally detectable sensors and assume that, without cooperation, its 
~ . Can cooperation over noisy hnks make the worst sensor under communication better 

than z-the best one without cooperation? We expUcitly find a threshold on the ratio CSNR/SSNR 
above which communication pays off in the latter sense; the threshold is a function of the network 
algebraic connectivity. 

Brief comment on the literature. Consensus+innovations distributed algorithms as in (1) or in [4], [1], 
[2], [3], [5], [6], [8], that interleave consensus and innovations at the same time instant contrast with 
decentraUzed parallel fusion architectures, e.g., [9], [10], [11], [12], [13], where all sensors communicate 
with a fusion sensor or with consensus-based detection schemes (no fusion sensor,) for example, [14], 
[15], [16], [17], where sensors in the network, initially, 1) collect a single snapshot of measurements, 
and, subsequently, 2) run the consensus algorithm to fuse their decision rules. 

We consider noisy communications among sensors. Communications imperfections in consensus-based 
detection in sensor networks are usually modeled via intermittent link failures and additive noise [18]. 
For consensus+innovations distributed algorithms, the LMS and RLS adaptive algorithms in [1], [2], 
[3] and the distributed change detection algorithm in [8] do not consider hnk failures nor additive 
noise. References [4], [19], [20] consider Hnk failures but no additive communication noise. Reference 
[21] considers deterministically time varying networks. Reference [6] is concerned with estimation and 
considers a very general model that includes sensor failures, Unk failures, and various degrees of either 
quantized or noisy communications. To the best of our knowledge and within the consensus+innovations 
detectors, only reference [5] and now this paper consider additive noise in the communications among 
sensors, but no link failures. We highlight the main differences between our work and [5]. 

Reference [5] proposes a consensus+innovations distributed detector that it refers to as M.V. Algorithm 

'The algebraic connectivity is the second smallest eigenvalue of the graph Laplacian matrix [7] that measures the speed of 
averaging across the network; larger algebraic connectivity means faster averaging. 
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M-V assumes very general data distributions: temporally independent, spatially correlated sensing noise 
and temporally independent, spatially correlated additive communication noise, both with generic distri- 
butions with finite second moments. Under global detectability and connectedness assumptions, [5] shows 
that AdV's error probability i-"^ decays to zero at all nodes i, but [5] only shows exponential decay rate of 
the error probability for a modified, SV scheme, when the noises are Gaussian, and all sensors are locally 
detectable, with equal Chernoff informations Ci > 0^ ([5], Corollary 12.) In fact, as we show in this 
paper. Appendix B, the probability of error for MV is not exponential; it is instead sub exponential, 
i.e., the rate is strictly slower than exponential, when the Q's are not equal, with possibly some Q's 
equal to zero. The subexponential rate of the MV and SV algorithms (with unequal Cj's) is due to the 
decay rates assumed by these algorithms for the stochastic approximation weight sequences. In contrast, 
in the consensus+innovations algorithm that we propose, we craft carefully these weight sequences; this 
enables us to show for the Gaussian problem and under global detectability and connectedness that our 
distributed detector achieves exponential decay rate for P*^ at every sensor, regardless of the equal or 
unequal Cj's, where some can possibly be zero. Further, we optimize the weight sequences so that sensors 
achieve the maximum payoff from their (noisy) cooperation with other sensors. We derive our results on 
the P"^ under Gaussian assumptions on the sensing and communication noises, but our results extend, 
to a certain degree, to the non-Gaussian (time-independent and space-independent) zero mean sensing 
noise with finite second moment and to the non-Gaussian (time-independent and space-correlated) zero 
mean communication noise with finite second moment. 

The Gaussian assumptions allow us to completely characterize the rate of decay of the error probabiUty 
solely on the basis of the first two moments (mean and variance) of the node's decision variable or state, 
say Xi{k). With non-Gaussian noises, the first two moments no longer suffice to determine the rate of 
decay of the error probability P'^, but they still represent a good measure of detection performance. In 
this case, we can still show that the (local) detector signal-to noise ratio DSNRi(fe) at each sensor i that 
we define by the ratio of the square of the mean over the variance of the sensor i state Xi {k) grows at 
the same rate ~ k, as for the optimal centraUzed detector. 

We relate now this paper to our prior work on distributed detection, [19], [20]. While [19], [20] 
study the effect of link failures on detection performance, this paper addresses additive communication 
noise in the links among sensors. Our analysis here reveals that communication noise has an effect 

^Sensor i can detect the event individually (is locally detectable) if and only if d > 0; see ahead Definition 1 and Fact 2. 

^We show in Appendix B that maxi=i,...,jv Pf (fc)-the worst error error probability at time k among all N sensors, is at least 
e'"''^ , where t € (0.5, 1) and c> 0. 
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that is qualitatively different from that of hnk failures; with hnk failures, the more communication that 
is actually achieved among sensors the better the error performance, since when communication does 
happen sensors receive their neighbors decision variables unencumbered by noise. On the other hand, 
communication noise leads to a clear tradeoff between communication noise and information flow (degree 
at which consensus helps), with a cooperation payoff-threshold on the CSNR. To show these results, 
the analysis we develop here is very different from the analysis we advanced in [19], [20]. In [19], 
[20], we consider independent identically distributed (i.i.d.) averaging matrices W{k) (and hence the 
distribution of the W{k) is time invariant,) and no communication noise. In contrast, this paper considers 
time-decaying stochastic approximation weights (and hence, time varying weight matrices W{k)) and 
additive communication noise; these additional challenges do not allow for our tools in [19], [20] and 
demand new analysis. A final comment to distinguish our methods here with those in [5]. Reference [5] 
uses standard stochastic approximation techniques [22] that yield the exact asymptotic covariance of 
the decision variable vector (when the local Chernoff informations are equal) the asymptotic covariance 
is given by a difficult to interpret matrix integration formula. In contrast, we do not pursue the exact 
asymptotic covariance of the decision variable, but get, instead, tight, simple, easy to interpret lower and 
upper bounds by exploiting the natural separabihty between the communication noise and the information 
flow (averaging) effects. 

Paper organization. The next paragraph introduces notation. Section n describes the problem model 
and presents our distributed detector. Section m states our modeUng assumptions and gives preHminary 
analysis. Section IV presents our main results on the asymptotic performance of our distributed detector. 

Section V proves our main results. Section VI presents extensions to the non-Gaussian case. Finally, 
section VII concludes the paper. Appendices A-B provide remaining proofs. 

Notation. We denote by: Aij or [A]-j (as appropriate) the {i,j)-Xh entry of a matrix A; Oj or [a]i the i-th 
entry of a vector a; A~^ the transpose of A; I, 1, and Cj, respectively, the identity matrix, the column 
vector with unit entries, and the z-th column of /, J := (l/AT)!!^ the N x N ideal averaging matrix; 
II • II; the vector (respectively, matrix) Z-norm; || • || = || • II2 the Euclidean (respectively, spectral) norm; 
Ai(-) and tr( ) the z-th smallest eigenvalue, and the trace of a matrix; Diag (o) the diagonal matrix with 
the diagonal equal to the vector a; |^| the cardinality of A; E [•], Var(-), Cov(-), and P(-) the expected 
value, the variance, the covariance, and probability operators, respectively; X4 the indicator function of 
the event A; M (/x, S) the normal distribution with mean /j, and covariance S; Q(-) the Q-function, i.e.. 
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the function that calculates the right tail probabihty of the standard normal distribution; 




du, teR. 



(2) 



We also make use of the standard Q, and O notations: f{k) = Q, {g{k)) stands for existence of a K > 
such that f{k) > cg{k), for some c > 0, for all k > K; and f{k) = O {g{k)) means existence of K > 
such that f{k) < cg{k), for some c > 0, for all k> K. 



This section presents the network model and our consensus+innovations distributed detector whose 
performance we analyze in section IV. The current section also considers the centralized and isolated 
sensor detectors for benchmarking our consensus+innovations detector and defines certain relevant signal- 
to-noise ratios. 



We consider a network of N sensors. The topology of the network defines who can communicate with 
whom and is described by a simple (no self or multiple links,) undirected graph Q = {V,£), where V 
is the set of sensors with |V| = N, and £ is the set of links or communication channels among sensors: 
the link between sensors i and j is represented in the graph by G £. The neighborhood set Oi and 
the degree di of sensor i are Oi = {j : G E} and di = \Oi\, respectively. The N x N adjacency 
matrix is ^4 = [Ay], with = 1 if (i, j) G E and Aij = else (with An = 0, for all i) The graph 
Laplacian is C = D — A, where D = Diag (di, djv). Consider the eigenvalue decomposition 



where A(£) = Diag (Ai(£), AAr(£)), with the eigenvalues in increasing order, and the columns qi 
of Q are the orthonormal eigenvectors of C. It is well known that \i{C) = and qi = Further, Q 

is connected if and only if the algebraic connectivity A2(-C) > 0, [7]. 

B. Isolated sensor detector and centralized detector 

We consider the known signal in Gaussian noise binary hypotheses test between Hi and Hq. At time 
k, sensor i measures the (scalar) yi{k): 



II. Binary Hypotheses Testing: Centralized and Distributed 



A. Network 



C = QA{C)Q 



(3) 



under Hi : yi{k) = [mi]i + Ci(fc), / = 0, 1, 
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with prior probabilities < P{Hi) = 1 — P{Hq) < 1. Here [m;]j is a constant known signal and the 
sensing noise {Ci{k)} is a zero mean (z.m.) independent identically distributed (i.i.d.) Gaussian sequence. 
Introduce the vector notation 

y{k) = {yi{k),--- ,yN{k)y , mi = {[mi]i,--- ,[m/]^)^, C{k) = {Ci{k),--- ,CN{k))~^ . 
The covariance of the sensing noise is S"^ = Gov {C{k)) ■ 

Isolated sensor detector. A sensor working in isolation processes only its own observation. The test 
statistic is the local UkeUhood ratio (£LLR) T>i{k), A; = 1, • • • , given by the sum of the instantaneous 
£LLR Li{j), j = 1, • • ■ , A;, where 

yi{j) - ^^^^^^4^ 



= [mi - mo]i 



k 



The isolated sensor i detector thresholds 'Di{k) against a threshold Ti{k). 

Centralized detector. The centralized log-likelihood ratio (cLLR) for the single vector measurement 

y{k) (all sensors measurements are available at the fusion center) is: 

mi + mo 



L{k) = (mi - mo)^5c-^ (yik) 



The cLLR at time k is: 

k 

P(fc) = i^L(i) (5) 
The optimal centrahzed detector thresholds the cLLR against T(A;).For future reference we introduce: 

ri{k) = {rji{k),m{k), ...,riN{k)V (6) 
V^{k) = [5r^(mi-mo)],(y.(fc)- t"^^^' + ^"^°^- ). (7) 

Conditioned on Hi, Z = 0, 1, the sequence r]{k) is i.i.d. Gaussian with mean m^^ and covariance 8^: 

m« = (-l)(^+i)Diag (s^-^mi - mo)) ^(mi - mo) (8) 
Sr, = Diag (^S^\mi - mo)) S<; Diag (^^-^(mi - mo)) . (9) 
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With (7), we rewrite the cLLR L{k) at time k as the separable sum of r)i{kys: 



N 



i=l 



C. Distributed detector: Consensus+Innovations 

We now consider the consensus+innovations distributed detector, see (1), with structure hke the struc- 
ture of the distributed estimator in [6] or of the MV distributed detector in [5]. The key to ours is 
our choice of the consensus weight sequence 7^ that we show to be the optimal one and will lead to 
exponential decay rate of the probability of error of the consensus+innovations distributed detector, which 
is not the case in general for the Jv[V distributed detector in [5] as we show in Appendix B. 

To set-up the distributed detector, let the decision variable or the current state of sensor i at time k be 
Xi{k). Due to the communication noise, when sensor j transmits to sensor i its state, sensor i receives 
a noisy version: 

Xj{k) + Uij{k), (10) 

where Vij{k) is the communications noise. Note that (10) is a high level model, i.e., we do not model 
here the physical communication channel, but rather we model the estimation errors at the receiver. 

We propose as distributed detector the single time scale, stochastic approximation, consensus+iimovations 
algorithm where each sensor updates its decision variable two-fold: 1) by consensus, i.e., averaging its 
decision variable with the decision variables of its immediate neighbors — ^the sensors with which it 
communicates; and 2) by innovation, i.e., by incorporating the innovation rjiik) in (7), after sensing its 
local observation. The consensus+innovation update of Xi{k) is given by: 

k 1 
Xi{k + 1) = Xi{k) + r——ak V {{xj{k) - Xi{k)) + nj{k)) r {m{k + 1) - Xi{k)) (11) 
fc + 1 ^ fc + l"^ ^ ' 



innovations 



consensus 

k 



k 1 

Y (1 - akdi) Xi{k) + ^ (xjik) + i^ij{k)) + -^-^^ik + 1) (12) 



Xiii) = mil)- 

We write (12) in matrix form. The communication noise at sensor i from all its neighbors, and the 
corresponding vector quantity for all sensors, are (see (12)) 

Vi{k) := J2 ^ijik), v{k) := {vi{k),--- ,VN{k))'^ . (13) 
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Denote also by Sy := Gov .Likewise, define the vector of the sensors decision variables or 

vector of sensor states x{k) := {xi{k),--- ,a;jv(fc))^ and the time varying, deterministic, averaging 
matrix W{k) := I — a^C, where C is the graph Laplacian, see Subsection II-A. Then (12) is: 

x{k + 1) = ^Wik)xik) + ^akv{k) + ^Vik + 1), k = 1, 2, .., x(l) = r/(l), (14) 

where r]{k) is given in (6). When the noises are Gaussian, since (12) and (14) are linear, the decision 
variables Xi{k) and x{k) are Gaussian. For the vector x{k) of the decision variables Xi{k), the vector ij,{k) 
of the means Hi{k), 1 < i < N under Hi (respectively, Hq) and the covariance Sfj_{k) under either 
hypotheses are: 

fi{k) = E [x{k) \Hi] = -E [x{k) \Ho] = (/xi(fc) fi2{k) ■ ■ ■ m{k)V (15) 
S^{k) = Cov{x{k)). 

We let the diagonal elements of S^{k) be 

aUk) = [S,{k)],,. (16) 
Weight sequences 7^ and 7^. Comparing (11) with (1), the consensus and innovations weights are 

Due to the communication noise, the {ofe} have to be diminishing, i.e., 0, as pointed out in [18], 

[6], [5]. The design of the {ak} will be key to the distributed detector achieving exponential decay rate 
of the error probability: a small and fast-decaying ctfc injects low communication noise in the decision 
variable Xi{k), but limits the information flow among neighbors (insufficient averaging). We will show 
that, for appropriately designed constants a, bo > 0, 

bo 
a + k 

balances these two opposing effects — communication noise and information flow. As detailed in Section 
rv, a large 60 yields larger noise injection but also greater inter-sensor averaging. 

We compare the consensus+innovations distributed detector (12) with the distributed detectors in [4] 
and [5]. The detector in [4], referred to as running consensus, uses constant, non-decaying weights = a, 
which is not suitable for noisy communication. To account for communication noise, references [6], [5] 
propose mixed time scale, stochastic approximation type algorithms. In particular, for detection, [5] pro- 
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poses the MV algorithm for the generic case of different signal-to-noise ratios (SNR) at different sensors 
and the single time scale SV algorithm when the SNR is the same at all sensors. The algorithm MV 
uses the weight sequences (see eqn. (13), [5]) 

Consensus : {^}, r G (0.5; 1), ci > (17) 
Innovations: 1x1' '^^'^'^ ^^^^ 

The algorithm SV uses the same weights for both consensus and innovations (see eqn. (53) in [5]) 
equal to (17) and (18) with r = 1. In contrast with [5], we propose the single time scale algorithm (12) 
regardless if the local SNR are mutually different or not. A major contribution here with respect to [5] 
is to show that the single time scale algorithm (12) yields better asymptotic detection performance than 
the mixed time scale algorithm A4V. Our algorithm (12) and SV are both single time scale. The main 
differences between (12) and SV are that algorithm (12): 1) incorporates a in (II-C); and 2) optimizes 
the parameter bo. We will show that (12) exhibits under appropriate structural conditions exponential rate 
of decay of the probabiUty of error at every sensor, while MV in [5] is sub exponential; SV is shown 
to be exponential, but only when the sensors are identical, i.e., all operate under the same SNR. 

D. Signal-to-noise ratios (SNR) 

We define, for future reference, the following relevant SNRs. 

1. Sensing SNR: 

global : SSNR := (mi - mo)S^^{mi - mo) (19) 
local : SSNR, := (20) 

2. Detector SNR, for a generic detector gen (either centralized, isolated, or distributed): 

DSNR m- E^[^gen(fc)|gi] 

DSNR,en(fc) - Var(P,en(fe)|i^l)' 

where Vgen {k) is the detector decision variable. We denote the decision SNR for the centralized, isolated, 
and distributed detector, respectively, by DSNR(A;), DSNRi(A;), and DSNRdis,i(A;). We will see later 
how detector SNR determines the error probabiUty for Gaussian detectors; see ahead (25). 

3. Communication SNR is a quantity that accounts for the communication noise and plays a role only 
with the distributed detector We define the communication SNR (per sensor) by: 

CSNR= II ^ , (21) 
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where we recall the communication noise covanance — Coy{v{k)). 

Remark. We give a hint why CSNR is defined as (21) and why it plays a significant role in assessing 
distributed detection performance. With our distributed detector, sensors communicate their local decision 
variables Xi{k); Xi{k) is a local approximation of the (scaled) centralized decision variable -^V{k) (see 
(5)). The mean of Xi{k) under Hi, for large k, is close to the mean of j^T){k), equal to 2^SSNR (as 
will be shown); the variance of Xi{k), as will be shown, vanishes at rate 1/k. Hence, CSNR describes 
how well, in a sense, the signal Xi{k) competes against the noise Vi{k) in communication, for large k. 
We define also the communication gain as the ratio of the communication SNR and the average (across 
sensors) sensing SNR"^: 

_ CSNR _ ^SSNR 
'^"^SSNR" • 

For future reference, we introduce here the following two constants that we will need when assessing 
distributed detection performance: 

^ ._ 2^/]V||mW|| ^ . \\S,\\ 
^ ■ iSSNR ' " ■ ;^SSNR' 



III. Modeling assumptions and Preliminary results 

In this Section, we establish our underlying assumptions and present the asymptotic performance of the 
isolated sensor and centralized detectors. These will be a prelude to our main results on the asymptotic 
performance of the consensus+innovations detector in (12) given in Section IV and proven in Section V 
and Appendix A. 

A. Modeling assumptions 

As mentioned in Section n, the noises are zero mean Gaussian spatially correlated but temporally 
independent sequences. In Section VI, we will consider the case where the noises are not Gaussian. 

Assumption 1 (Gaussian noises) The sensing and the communication noises Ci{k) and i^ijik) are zero 
mean, Gaussian spatially correlated and temporally independent noises, and independent of each other: 

C(A:) ~ AT (0,5^), vik)^Ar{0,S,), 

where the vector v{k) is defined in (13) and 5^ and Sy are assumed to be positive definite. 

''Note that CSNR and SSNR are not independent quantities here; larger SSNR means larger CSNR. 
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In distributed processing, the ability for the sensors or agents to cooperate is fundamental; this is captured 
by the connectivity of the network. 

Assumption 2 (Network connectedness) The network Q = {V,6) is connected. 

As it is well known, a necessary and sufficient condition for connectedness is A2(-C) > 0, i.e., the algebraic 
connectivity of the network is strictly positive. The next assumption is on the weight sequence {ajt}. 

Assumption 3 (Weight sequence) The weight sequence {ctfe} is: 



The role of these conditions will become clear when we state our main result, Theorem 3, in Section IV. 
Recall the sensing SNRs in (19). We make the following assumption on SSNR. 

Assumption 4 SSNR > 0. 

Note that Assumption 4 is equivalent to having different mean vectors, mi ^ mo. To obtain certain 
specialized results, we will assume a stronger assumption than Assumption 4. 

Assumption 5 (Equal local sensing SNRs) SSNRj = SSNRj > 0, Vi / j. 

B. Asymptotic performance and Chernojf information 

For Gaussian decision variables like for the three detectors (isolated sensor, centralized, and consen- 
sus+innovations distributed), and equal prior probabilities, the probabihty of error P^{k) is given by 



a 



Oik = 



bo + k 



where the constants a and 6o satisfy 




(24) 




(25) 



where Q is the Q-function in (2) and DSNRgcn(fc) is the generic detector SNR. 

To determine the exponential decay rate of the error probability, we recall the bounds [23] 



^e-*V2<2^Q(i)< le-*V2, t>0, 



(26) 
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We apply these bounds to (25). Taking the logarithm and dividing by k, the limsup of the right hand 
side (rhs) inequality and the liminf of the left hand side (Ihs) inequaUty in (26) lead to ^: 

limsup-^logP'=(A;) < limsup ;^DSNRgen(A;) (27) 

fc— >oo k k^oo ^k 

liminf-ilogP^(fe) > liminf ^DSNRge„(fc). (28) 

k-^oo k k-^oo 2k 

If the limsup in (27) is zero, we have two possibilities: 1) the error probability P'^{k) decays to zero 
slower than exponentially in k; or 2) P^{k) does not converge to zero at all. Intuitively, large mean and 
fast-shrinking variance of the decision variables Pgen(fc) increase DSNRgen(A:) and hence yield good 
detection performance. 

The detector for the isolated sensor i given by equation (4) and the centralized detector in (5) are 
the optimal minimum probability of error detectors (when prior probabihties are equal, the threshold 
T{k) = 0.) For these detectors, the lim sup and the lim inf in (27) and (28) actually coincide, i.e., the 
sequence ^DSNRgen(A;) has a hmit and 

Cge„= lim-llogP|,Jfc) = lim DSNRgen(fc) _ ^29) 

These detectors maximize the exponential decay rate of the probability of error for the corresponding 
problems; this optimal exponential decay rate is the Chemoff information Cgen as indicated in (29). For 
the isolated sensor i detector and the centrahzed detector, it is easily shown that their detectors SNR 
DSNRi(fc) and DSNR(A;) are given by^ 

DSNRi(fe) = ^SSNRi, DSNR {k) = ^SSNR. (30) 

The Chemoff information for the isolated and the centralized optimal detectors are then: 

a= lim -ilogi^^ {k) = lim PS^^i(^) = IsSNRi (31) 

C = lim -\ \ogP^,^ (k) = lim "^^f = ^SSNR . (32) 

k-^oo K fe->oo Ik O 

Eqns. (31) and (32) justify the following definition of the global and local detectability, after which we 
relate global and local detectability to sensing (global and local) SNRs by a simple, but important fact. 



^Eqn. (28) holds because DSNRgcn(fc) is strictly positive (for large k) and can grow at most as ~ fc for either centrahzed, 
isolated, or distributed detectors (as will be shown). 

''We will see later that DSNRi^dis(fc) grows also at rate k with our distributed detector (12). 
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Definition 1 The network Q = (V, S) is globally detectable if the probability of error P^(A;) of the optimal 
centralized detector decays exponentially fast. The sensor i e V is locally detectable if the probabiUty 
of error P^{k) of its optimal detector in isolation decays exponentially fast. 

Fact 2 The network Q = {V,S) is globally detectable if and only if SSNR > 0. The sensor i e V is 
locally detectable if and only if SSNRj > 0. 

Clearly, a network can be globally detectable but have many (or most) sensors that are not locally de- 
tectable; however, at least one sensor needs to be locally detectable so that global detectability holds. Our 
goal is to carry out a similar asymptotic performance analysis for the distributed consensus+innovations 
detector in (12) or in vector form in (14). Because Xi{k) is Gaussian, relations (28) and (27) still apply. 
Recall the mean and variance of the decision variable Xi{k) under Hi, iJ,i{k) and erf (A;), given by (15) 
and (16). The distributed detector SNR, DSNRdis,i(A;), at sensor i at time k is then: 

DSNRdis,i(fc) = 

We obtain the moments iJii{k) and cFf{k) of Xi{k) and their asymptotic values in the next Section IV by 
analyzing the distributed algorithm in (12). In contrast with the centralized and isolated detectors, these 
statistics of the decision variable Xi{k) of the ith sensor are affected by the communication noise i'ij{k), 
see equation (10), through Vi{k) and v{k) in (13) and (13). We will see that, besides DSNR(jis,i(A;), we 
need to account for the impact of the Gc given in (22). 

IV. CONSENSUS+lNNOVATIONS DISTRIBUTED DETECTION: PERFORMANCE ANALYSIS 

Subsection IV-A studies the exponential decay of our consensus+innovations detector in (12), subsection 
rV-B addresses the optimality of the weight sequence {afe}, and subsection IV-C addresses the potential 
payoff of distributed detection arising from noisy cooperation among sensors. 

A. Exponential decay of the error probability 

The next Theorem establishes under reasonable conditions that the probability of error at every sensor 
of the consensus+innovations distributed detector in (12) decays exponentially fast. Recall the definitions 
of SSNR and Gc in (19) and (22), and the constants and Co- in (23). 

Theorem 3 Consider the consensus+innovations distributed detector in (12) under the Assumptions 1, 2, 
3, and 4. Then: 
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1) The moments fi{k), iJ-i{k), and (Tf{k) satisfy: 



/Xoo := lim ^^{k) = {I + bo C)-^ mW (33) 

fc->oo ' 



lim > ^SSNR(^l-^^^c,j (34) 

limsupfeafCfe) < -^SSNRfl + 3 + M V (35) 

2) The exponential decay rate of the error probability P^^^-iik) at every sensor i satisfies: 

li-mf-il„gi?,..(*) > isSNR 'r'^'; W- P6, 



Before proving the Theorem, which we carry out in Section V, we analyze how the bound on the rhs 
of (36) depends on the different SNRs, on the network connectivity X2{j0), and on the parameter bo of 
the weight sequence {0^}. The discussion is summarized in the following five remarks on Theorem 3. 
1. Exponential decay of the error probability P|;g Under global detectabiUty and connectedness, 
Theorem 3 states that the error probabihty P|;g ; at every sensor i decays exponentially to zero even if 
sensor i is (in isolation) not detectable (SSNRj = 0,) and even when the communication hnks are very 
noisy (Gc > but small.) This feature of the distributed detector (12) significantly improves over existing 
work like MD in [5]. Namely, we prove in Appendix B that MV achieves only a sub exponential decay 
rate of order e^'^^^ , r < 1, c > 0, of the error probability, irrespective of Gc- 

2. Effect of Gc. The bound on the rhs of (36) shows quantitatively that higher Gc leads to better 
detection, confirming the quahtative discussion in the Remark below (21). 

3. Effect of the network connectivity \2{C). Theorem 3 shows that the network connectivity plays a 
role in the detection performance through the algebraic connectivity A2(>C). Larger values of A2(>C), which 
allow for faster averaging, increase the bound (36) yielding faster decay rate for the error probabihty. 

4. Tradeoff: Communication noise vs. information flow. With optimal centrahzed detection (that 
corresponds to a fully connected network and no additive communication noise) we have that, for all i, 
Pf{k) = P%k), and: linifc^oo logP^ik) = |SSNR. Then, from (36), all the three terms: 

(37) 



N 



l+6oA2(£)''/* 

^ = ^ (39) 



(38) 
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decrease the bound and so they quantify the decrease in performance of the distributed detector with 
respect to the centralized detector. This decrease comes from two effects: 1) communication noise; and 
2) insufficient information flow. 

From (37)-(39), we can see how the parameter bo affects in opposing ways these two effects: The 
terms (37) and (38) relate to the information flow, while the term (39) is due to communication noise. 
We see that the net effect of increasing bo is to increase the effective algebraic connectivity (bo multiplies 
X2{j0)), increasing (37) and (38); on the other hand, it reduces CSNR as seen from (39). 

The weight choice in (II-C) optimally balances these two effects if we tune the parameter bo to 
maximize the right hand side in (36). This is a scalar optimization problem in bo and can be easily 
numerically performed. Lemma 6 find the optimal bo in closed form for a simpUfied case. 

5. Tradeoff: Bias-variance. Theorem 3 reveals a certain bias-variance tradeoff. Ideally, we would hke 
the bias-free decision variables: 



where 1 is the vector of ones; i.e., all sensors should have as asymptotic decision variable the asymptotic 
centralized decision variable. That is, we want the mean of the decision variable at each sensor to converge 
to the expected value of the centralized decision variable V^k) (divided by 1/A^.) Our algorithm (12) 
introduces a bias (see (33) and (34)), but, on the other hand, it decreases the variance at the optimal 
rate 1/k. In contrast, MV in [5] does not have the bias, but it decreases the variance at a slower 
rate. Compared to MT>, our algorithm (12) better resolves the bias-variance tradeoff in terms of the 
detection performance; algorithm (12) decays the error probabiUty exponentially, while MV decays it 
sub exponentially. We now consider a special case where all sensors are identical, or, better said, they 
operate under the same SSNRj, i.e.. Assumption 5 holds. Theorem 3 takes a simphfied form, where ^^o 
becomes bias free, as in (40). Further, = = 1 and second condition in (24) becomes 6o > 0; it can 
also be shown (details omitted) that the factor 3 in (38) reduces to 1. The simplified Theorem 3 follows. 

Theorem 4 (Asymptotic performance: Identical sensors) Let Assumptions 1 through 3 and 5 hold. Then, 
the exponential decay rate of the error probability at each sensor i satisfies: 




(40) 



liminf-ilogPdV,(A;) > ^SSNR 



k^oo k 




1 



(41) 



> ^SSNR 



8 




1 



(42) 
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B. Optimality of the weight sequence {oik\ 

Order-optimality. We consider the role of the weight sequence (II-C), in particular, we show the 
optimality of the rate 1/k. To this end, we consider the distributed detector (12) but modify the weight 
sequence; we refer to the modified sequence as j5k. We find an upper bound on the decay rate of the 
error probability when the weight sequence is re-set to Ofe. 

Theorem 5 Let Assumptions 1 — 3 and 5 hold. Suppose that the weight choice ak in (II-C) is replaced 
by: 



where r > 0. We have: 



limsup-^logP|ig^i(/c) < < 



if r < 1 

iSSNR, if r > 1 (43) 



|SSNR „ ^ ....... if r = 1. 



K Ai(S„) 



l+26oAjv(£) JVSSNR; 

Proof: See Appendix A. ■ 
Two remarks on Theorem 5 are in order. 

1. Order-optimality of a^. Theorem 5 says that the choice in (II-C) is the optimal weight choice 
in the family of choices (ik = (a+V) ' ^' P^r^etrized by r > 0. If decays too slowly (r < 1), 
then the error probability converges to zero at a rate slower than exponential (if at all it converges to 
zero.) On the other hand, if fik decays too fast (r > 1), then the error probabiUty does decay to zero 
exponentially, but the rate is no better than the rate of the individual detection, irrespective of Gc. 

2. Tightness of the bounds in Theorems 4 and 5. The upper bound (43) for r = 1 explains the 
tightness of the lower bound in Theorem 4 and the unavoidable simultaneous effects of the communication 
noise and information flow. The sequence {au} balances these via the parameter 6o- 

Optimal 6q. We now find 6q that optimizes (maximizes) (42); we pursue (42) rather than (41) as it 
allows simpler, closed form expressions. Proof of Lemma 6 follows after setting the derivative of the 
denominator of rhs in (42) to zero and is hence omitted for brevity. 

Lemma 6 Let Assumptions 1 through 3 and 5 hold. The optimal parameter that maximizes (42) and 
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K = -^TZ^^nTTTi^ (44) 



the corresponding lower bound on liminffe_^oo — | log PP(A;), are, respectively: 

A2(£) 1/341/3 

where 

CO = 141/3 + 4-2/3 = ^(2)-V3^ 1.19. 

We use Lemma 6 to compare the distributed detector with the optimal centralized detector and the 

optimal single sensor detector. In the very high Gc regime (weak communication noise), when Gc — )■ 00, 
the distributed detector (at all sensors) achieves the asymptotic performance of the optimal centralized 
detector. On the other hand, when Gc decreases, at some point, the rhs in (45) falls below |SSNRj 
and the distributed detector (12) at sensor i becomes worse than if sensor i worked in isolation. The 
discussion is formalized in the next Subsection that considers when sensors should cooperate. 

C. Communication payoff 

Eqn. (45) under low 6/Gc raises the issue whether a sensor i should cooperate with its neighbors or 
not. We next formalize communication payoff. 

Definition 7 (Communication payoff) The network Q = (V,f ) achieves communication payoff if: 
rnin jliminf-ilogP^i^ i(A;)l > max j^SSNRij . 

1=1,. ..,N \^ K J 1=1,. ..,N J 

Definition 7 says that the network achieves a communication payoff if the distributed detector error 
performance of the worst sensor is better than the isolated detector error performance for the best sensor 
without communication. Lemma 8 finds a threshold on the Gc above which it does pay off for sensors 
to communicate with their neighbors. Proofs of Lemma 8 is simple and is omitted. 

Lemma 8 Let Assumptions 1 through 3 and 5 hold. Set bo to the optimal value in (44). If 

then the network achieves the communication payoff in the sense of Definition 7. 

V. Proof of Theorem 3 
Subsection V-A sets up the analysis and Subsection V-B proves Theorem 3. 
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A. Solution of the consensus+innovations distributed detector 
Define the matrices $(/>;, j), k > j > 1, as follows: 



W{k-l)W{k-2)...W{j) ifl<j<k 
I if j = k. 



Then, the solution to the distributed detector (14) is: 

k , fc-1 



x(fc) = i^$(/c,j>(j) + ij^(ja,)<&(fe,j + lHi), fc = l,2,3,... (46) 



Introduce 



It can be seen that 



k^^ V . ^ 



W{k) := W{k)-J 
^{k,j) := W{k-l)W{k-2)...W{j), k> j >1. 

^k,j) = ^k,j)-J. 



In consensus, W{k) J, where J is the ideal consensus averaging matrix. The matrix W{k) and its 
norm measure, in a sense, the imperfection in the information flow, i.e., how far W{k) is away 

from and W{k) from J. If (24) holds then it is easy to see that 

. (47) 

We see that the role of a in is to be an offset that enables (47) to hold for all k; that is, a reduces 
||VF(/c)|| for large bo and small k. We will see that 6o is the effective tuning parameter that controls the 
detection performance. We also comment that the ratio jjj^ is maximized by Ramanujan networks, 
see [24] for details. 

B. Proof 

Proof of Theorem 3: Claims (33) and (34); We first study the mean of the decision variable fi{k). 
It evolves according to the following recursion (which can be seen by taking the expectation in (14)): 
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Next, we consider the error e{k) of fj,{k) wrt the assumed fx^o given in (33): 

e{k) := /x(A;)-(/ + 6o>C)-^mW. 
We will show that e{k) 0, which implies (33). Algebraic manipulations show that e{k) satisfies: 

(49) 



where 



r{k) = i-{k + i){i + hQC)-^ + kw{k) {I + ho c)-^ 

= I-{k + l){I + boJCr^ + k(l~-^£] {I + boCy^. 

\ a + k J 

Recall the eigendecomposition of the Laplacian in (3). The matrix T{k) has the same eigenvectors as £; 
simple calculations show that the eigenvalue Aj {T{k)) that corresponds to the eigenvector qi is: 



ink)) = { 
Then, clearly, for some cr > 0, 



if i = 1 

k i+feoM£) fcTH =• ^ otherwise. 



\m\\ < f • (50) 



We now decompose e{k) into the consensus subspace, i.e., the component colinear with the vector 1, and 
the component orthogonal to 1: e(A;) = {I - J) e(A;) + Je{k) = {I - J) e(A;) + (;^l"^e(A;)) 1. We show 
separately that: 

lim {I-J)e{k) = (51) 

k—^oo 



Then, (51) and (52) together imply that 



lim Ve{k) = 0. (52) 

fe— >-oo 



lim e{k) = 0. (53) 

fe— ^-oo 



We first show (52). Multiplying (49) from the left by 1^, using the orthogonality of the eigenvectors qi, 
and using the fact that l^W{k) = 1^, we get: 



l''e{k + l) = ^l^eik), 



which implies (52). 



August 9, 2011 



DRAFT 



21 



We now show (51). Denote by 

b:=boX2iC). 
Multiplying (49) from the left by {I — J), we get: 

(I-J)e(fe + 1) = Jl-^{I-J)W{k)e{k) + ^{I-J)mm<^^ 



^W{k){I - J)e{k) + ^T{k)m\^ 



i^), (54) 



where (54) holds because JW{k) = J, W{k)J = 0, and (J - J) r(A;) = r(A;). Now, by subadditivity 
and submultiplicativity of norms, (54) yields: 

k 1 



||(/-J)6(fc + l)|| < ^||T^(A;)||||(/-J)e(A;)|| + ^||r(fc)|||K 



(1)1 



a + k J k^ 

= ||(/-JMfc)||-^||(/- J)e(m| + ||. (55) 

Before proceeding, we invoke the following deterministic variant of a result due to Robbins and 
Siegmund (Lemma 11, Chapter 2.2., [25].) 

Lemma 9 ([25]) Let {u{k)}, {p{k)}, and {n{k)} be non-negative deterministic (scalar) sequences. Fur- 
ther, suppose that 

u{k + 1) < u{k) - p{k) + K{k), k = l,2, ... 

Suppose that X^^i nik) < oo. Then: 1) J2T=i Pi^) < ^^'^ l™fc^oo u{k) = u* exists. 
We apply Lemma 9 to (55) with 

u{k) = - J)e{k)l pik) = - J)e{k)\\, K{k) = ^. 



This proves that 



lim \\{I-J)e{k)\\=0, 

fe->oo 



i.e., proves (51). Namely, by Lemma 9, we have that 

b 

a + k 



k=l k=l 
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which imphes that 



liminf 11(1- J)e(A;)|| = 0. 



Also, by Lemma 9, lim.k^^ u{k) = limjt_^oo IK-'^^'^jcl^)!! exists, and, hence, limfe_^oo = 0- 

This completes the proof of (53). 

We now prove (34) using (33). Note first that 

(7 + 6o>C)-' = QA((7 + 6o>C)-^)g^, (56) 

where 

A((/ + boJC)-^) = Diag (1, (1 + boX2i£.))-\ (1 + boXNijC))-') • 

Thus, using the fact that qi = and J = qiqj , the matrix (I + 60-^)"^ decomposes as: 

{I + boCy^ = J + QA'Q'^, (57) 

with A = Diag (O, (1 + 6oA2(>C))-\ (1 + 6oAjv(>C))-^) . Multiplying (57) from the right by \ 
and using (8), we get that the entry [/Xoo]i equals 



[/^ook = ;5^SSNR+ QA'QT^D 



2N 

^ = TT6i(Z)ll"^?^ll yields (34). 



(58) 



Finally, the inequality 

We now prove (35); we use the following auxiliary result 

Lemma 10 Denote by: 

j=i ' J / 

^ k-l 

i=i 

Then, for all i, the following holds: 



(k) 



1 



limsupA;af(A;) < -^SSNR + 3||5^||Z* + ||5^||x*. (59) 



k—i-OO 
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Moreover, we have: 

X* = i>§. (61) 

Proof of Lemma 10: Consider (46). Using the independence of 'q{j) and r}{k), k ^ j, and the 
independence of r]{k) and v{j), for all k,j, and using the equality ^{k,j) = ^{k,j) + J, we have: 

k , fc-l 



^iik) = ^X^Var(e7$(fc,i)r?(i))+^X;(a,i)2Var(e7$(fc,i + lH^^^^ 

11 2 ^ ^ 

= /c [eJjSrjJci^ + ;pe7(/ - J)5„(/ - J)ei + J] J5^^(A;, jO^e^ 

fc-i fc-i 



i=i j=i 
Straightforward algebra shows: 

eJjSrjJci = -^SSNR. 

We next bound from above the quantity kaf{k), using (V-B) and the following norm arguments: 1) 

\\AB\\ < \\A\\ \\B\\; 2) < ||^|| ||6||, for square matrices A and B, and a vector b; 3) ||ei|| = 1; 4) 



^{k,j + l)\\ = 1. (The latter claim is because \\^{k, j + 1) || is doubly stochastic.) The bound on kaf {k) 



is as follows: 



kal{k) < i^SSNR+|||5,||5^||$(fe,j) 



i=i i=i 
= ^SSNR + 3||5^||Z(fc) + 115^11 x(A;) + ^e7 (I -J)5^(/-J)ei. (62) 

Taking the lim sup in (62) yields (59). 

We now prove (60). Note that Z{k) can be written via the following recursion: 

Z{k + 1) = (l-—^)(J^Z{k)+ ^ 



a + k + lj \k + l k + 1 

Z{1) = 1 ^>0. 

^ ' a + 1 

The proof of (60) proceeds analogously to the proof of (51), except that the vector quantity (/ — J)e{k) 
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is replaced by the scalar Z{k), and the vector m^^ is replaced by the scalar 1. The proof of (61) is 
trivial. Theorem 3 now follows by combining (34) and (35) with (28) to obtain (36). ■ 

VI. Extensions to Non-Gaussian case 

We have characterized the exponential decay rate of the error probabihty in the case of Gaussian 
(spatially correlated and time-uncorrelated) sensing noise, and Gaussian (spatially correlated and time- 
uncorrelated) additive communication noise. Our results, to a certain degree, extend to the case when: 
1) the zero mean sensing noise is spatially and temporally independent, but with a generic distribution 
with finite second moments; and 2) the zero mean additive communication noise is spatially correlated, 
temporally independent, and with a generic distribution and finite second moment. In this case. Theorem 3 
remains valid, and all the steps in proving Theorem 3, equations (33)-(35) still go through in the 
generalized case also (see Appendix.) We now explain the implications of Theorem 3 in the general 
non-Gaussian model. 

Consider DSNRi(A;) in (30). In the Gaussian case, DSNRj(A;) determines the exponential decay 
rate of the error probabihty, as verified by equations (27) and (28). In general, this is no longer the 
case as higher order moments play a role; however, DSNRj(fc) still gives a good estimate for detection 
performance; see, e.g., [26], [27], [28]. With the optimal centrahzed detector, we have that: 

DSNR(A;) 
lim ^ = -SSNR. 

k^oo k 4 

Theorem 3 imphes that, with our distributed detector (14), the following holds for all sensors i: 

fe^oo k ~4 h I q N „ Nbl\ 

Eqn. (63) says that DSNRj dis(^) grows as ^{k), as with the optimal centrahzed detector. This contrasts 
with MV in [5] which achieves only U{k'^), r < 1. Second, like before with Theorem 3, now (63) reveals 
the tradeoff between the information flow and communication noise. 

VII. Conclusion 

We designed a consensus+innovations distributed detector that achieves exponential decay rate of the 
detection error probabihty at all sensors under noisy communication links, and even when certain (or 
most sensors) in isolation cannot perform successful detection. This improves over existing work hke 
[5] that achieves a strictly slower rate. We showed how our distributed detector optimally weighs the 
neighbors' messages via the optimal sequence {a^}, balancing the two opposing effects: communication 
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noise and information flow. We found a threshold on the communication noise power above which a 
sensor that successfully detects the event in isolation still improves its performance through cooperation 
over noisy Unks. 

Appendix 

A. Proof of Theorem 5 

Define first the following quantities: 

k 

^ fc-i 

K ^ — ' K— >-oo 

We will need the following two Lemmas (11 and 12), of which Theorem 5 is a direct corollary. 

Lemma 11 Let Assumptions 1 through 3 hold. In addition, assume Assumption 5. For the weight sequence 
A = hII^' ^^^^ following: 



7* 



= if T < 1 

if T > 1 (64) 
if r = 1 



I, - l+26o^ 



+00 if T < 1 

if T > 1 (65) 
hi if T = L 



Lemma 12 



lim \ixi{k)\ = ^SSNRi (66) 

fc— >oo 2 

liminffcaKA;) > ^SSNR^ + SSNR^Z^ + ^^I^X^- (67) 

Proof of Lemma 11: We start by proving (64) for r < 1. To this end, note that Z^{k) updates 
according to the following recursion: 

2 



(68) 
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where h' := boXN{jO.). By (68), for sufficiently large ko, and for all k > ko, we have that: 

for appropriately chosen bf^ > 0. Now, applying Lemma 4 in [29], we get that limfc^oo 2^fs{k) = 0. 

We proceed by proving (64) for r > 1. By (68), and using the fact that Z^{k) < 1, "ik, the quantity 
Zp{k + 1) can be bounded from below as follows: 

Zaik + l) > T-^fl 7T—^^2.sik)+(l , ^ (69) 

' - k + \\ a + {k + \YJ V a+{k + iyjk + l 

2b' k ^ / 2b' \ 1 

- kT~i^^^^ Jk^WTW+m^^ ^^V a + {k + iy)kn 

k ^ 2b" 1 

^ kT-1^^^'^ - 1^ ^ kT-v 

for appropriately chosen b" > 0, and for all k > ki, where ki is sufficiently large. Now, consider the 
recursion: 

k 1 2b" 

U{k + \) = —-U{k) + -—-—, k = ki,ki + l,... (70) 

k + I k + 1 k^ 

U{ki) = Zfi{ki). 

Clearly, Z0{k) > U{k), for all k > ki. Subtracting 1 from both sides in (70) and applying Lemma 9 
yields U{k) — > 0, and, hence, liminik^oo Zjsik) > 1; on the other hand, Zfs{k) < 1, for all k, and, 
hence, (64) for r > 1 holds. 

To prove (64) and r = 1, consider (69); as r = 1, we have: 

k f 2b' \ 1 2b' 

^"'^ + " ^ FkT (' - liTkTl) ^'''"^ ^ kTl - WTW' " = - 

Now, define the recursion 

Similarly to the proof of (33) in Theorem 3, it can be shown that V{k) iq^- Noting that Z^{k) > 
V{k), k = l,2, ... yields (64) for r = L 

The proofs of (65) for r < 1, r > 1, and r = 1 are trivial. ■ 
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(72) 



Proof of Lemma 12: Note that, under the assumptions of Lemma 12, 5^ = SSNRj/. Thus, in (62), 
the term 

2 ~ 16 1 ~ 

p5^e7j5,$(A;, j)^e, = -^-SSNR,5^e7j$(fc, j)^e, = 0, (71) 

3=1 j=l 

because J^{k,j) = 0. Multiplying (62) by k, and using (71), we get: 

1 SSIMT? ^ ^ ^ 1 ^ / 

ka^k) = -SSNR, + —-^J24^ik,mkjVe, + -J2ij(3,f (e7$(fc, j)5.$(fc, j 

j=l 3=1 

We next bound kaf{k) from above, using the following simple relations: 

b^Ab > \i{A)\\b\\\ \\ei\\ = l 
rnkj^af > ^, (73) 

where (73) holds true because ^{k,j) is doubly stochastic. The upper bound on kaf{k) is as follows: 

kaUk) > ^ssNR, + ^1 {mmkjy) + ^ E (j' f^j)' MSvnmj + ^V^if 

3=1 3=1 

^ -ssNR. + ^ f: {^'s=3 (1 - >^N{mr) ^^(s.) 

3=1 3=1 

IssNR. + ^ Y: (nt.-. (1 - Mmr) +l^Y U ^3? 

3=1 3=1 

= ^^ + SSNR.Z,{k) + ^Mk). (74) 
Taking the liminf in (74) yields (67). ■ 

B. Decay rate of the error probability for the MT> algorithm in [5] 

We show that, under Gaussian assumptions, with the Algorithm MV in ([5], eqn. (14)), the error 
probability decays to zero at a rate slower than exponential. Recall that Xi{k), ^i{k), uf{k), and P[{k) 

are the sensor i's decision variable, its mean and variance under Hi, and its local error probability, 
respectively. (Now latter quantities correspond to M-V and no longer to (12).) Denote by P^{k) the 
worst error probability at time k among sensors: 

P^(fc) = .max^i^^(fe). (75) 



1 

> — . 

N 



v) 
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We show that:^ 

p^(k) = n (e-^^^^) , 0.5 < r < 1, , c> 0. (76) 
Denote by x{k) the vector of Xi{kys, as before, and := Gov {x{k)). We will prove (76) by showing: 

tr(S(fc)) = (77) 
Namely, with MV, limfe_^oo A*i(^) = g^SSNR, Vi Thus, for all k > k', for appropriate k' > 0: 

Vi=i,...,iv o-j (A;) y \^maxj=i,...,jv y I Jj^ti{^{k)) J 

for all k > k' and for appropriately chosen Cp > 0. Now, applying the upper bound on the Q function 
in (26) to (78) yields (76). It remains to show (77). Denote by W{k) := I — P^L — a^I the updating 
matrix in the MV algorithm, where Pk = jj^^iy, t € (5, 1)> and = jj^^iy, a, 6 > 0. In our notation, 
the update rule for x{k) with MV is as follows: 

x{k + 1) = Wik)x{k) + f5kv{k) + akvik). 

It can be shown that the covariance matrix S(fc) := Cov {x{k)) satisfies the following recurrent equation: 

S(A; + 1) = W{k)^{k)W{ky + alSrj + PlSy. (79) 

(Here and Sy denote respectively the covariance matrix of the innovations ri{k) and of the commu- 
nication noise v{k), as before.) Taking the trace in (79) and after algebraic manipulations, we get: 

tr(E(A: + l)) > Xi{W{kyW{k))ti{i:{k)) + altr{Sr,) + Pltr{Sy) 

> {1-ak- PkXNinf tr (S(A;)) + PMS^) 

> (1 - 2(afe + (3kXN{m tr (S(fe)) + /3|tr(5,) 



(k + iyj^ ^ " (A: + 1)2t' 
^By Theorem 3, with (12), P^(k) = 0(e"°''), hence better than MV. 
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for all k > k2 and k2 sufficiently large, for appropriately chosen cs,c^ > 0. Now, introduce ^{k) := 
tr (S(A;)) (A; + 1)^. Then, we have: 

Now, consider the sequence S{k) that evolves according to the recursion: 

S{k + l)=(^l- j^^^^ S{k) + jj^^, k = k2, k2 + 1, 5(fco) = liko). (80) 
Clearly, 7(A;) > S{k) > 0, for all k = k2, k2 + 1, ... It is easy to show that 

lim S{k) = — . (81) 

fc— >-oo Cj] 

Namely, subtracting ^ from both sides of equality (80) yields: 

(-(-^)-l)^(-^)H-l). 

which in turn implies (81); now, (81) imphes that j{k) = Q, (1). Hence, (77) holds. 
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